Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time

نویسندگان

  • Alberto Apostolico
  • Claudia Tagliacollo
چکیده

Compact bases formed by motifs called irredundant and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classi cation. Given a sequence s of n characters drawn from an alphabet Σ, the problem of extracting such a base from s had been previously solved in time O(n2 log n log |Σ|) and O(|Σ|n2 log n log log n), respectively, through resort to the FFT-based string searching by Fischer and Paterson. More recently, a solution taking time O(|Σ|n2) without resort to the FFT was also proposed. In the present paper, the problem is considered of extracting the bases of all su xes of a string incrementally. This problem was solved in previous work in time O(n3). A much faster incremental algorithm is described here, which takes time O(|Σ|n2 log n). Whereas also this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFTbased algorithms computing only one base. The implicit representation of a single base requires O(n) space, whence for nite alphabets the proposed solution is within a log n factor from optimality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Paradigms of Motif Discovery

We examine the problem of extracting maximal irredundant motifs from a string. A combinatorial argument poses a linear bound on the total number of such motifs, thereby opening the way to the quest for the fastest and most efficient methods of extraction. The basic paradigm explored here is that of iterated updates of the set of irredundant motifs in a string under consecutive unit symbol exten...

متن کامل

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

Remote Homology Detection of Protein Sequences

The automatic classification of protein sequences into families is of great help for the functional prediction and annotation of new proteins. In the paper we present a method called Irredundant Class that address the remote homology detection problem. The best performing methods that solve this problem are string kernels, that compute a similarity function between pairs of proteins based on th...

متن کامل

Some Geometrical Bases for Incremental-Iterative Methods (RESEARCH NOTE)

Finding the equilibrium path by non-linear structural analysis is one of the most important subjects in structural engineering. In this way, Incremental-Iterative methods are extremely used. This paper introduces several factors in incremental steps. In addition, it suggests some control criteria for the iterative part of the non-linear analysis. These techniques are based on the geometric of e...

متن کامل

Bridging Lossy and Lossless Compression by Motif Pattern Discovery

We present data compression techniques hinged on the notion of a motif, interpreted here as a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. This notion arises originally in the analysis of sequences, particularly biomolecules, due to its multiple implications in the understanding of biological structure and fu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 408  شماره 

صفحات  -

تاریخ انتشار 2008